22 research outputs found
Learning to Extract Motion from Videos in Convolutional Neural Networks
This paper shows how to extract dense optical flow from videos with a
convolutional neural network (CNN). The proposed model constitutes a potential
building block for deeper architectures to allow using motion without resorting
to an external algorithm, \eg for recognition in videos. We derive our network
architecture from signal processing principles to provide desired invariances
to image contrast, phase and texture. We constrain weights within the network
to enforce strict rotation invariance and substantially reduce the number of
parameters to learn. We demonstrate end-to-end training on only 8 sequences of
the Middlebury dataset, orders of magnitude less than competing CNN-based
motion estimation methods, and obtain comparable performance to classical
methods on the Middlebury benchmark. Importantly, our method outputs a
distributed representation of motion that allows representing multiple,
transparent motions, and dynamic textures. Our contributions on network design
and rotation invariance offer insights nonspecific to motion estimation
Accuracy of Anthropometric Measurements by a Video-based 3D Modelling Technique
The use of anthropometric measurements, to understand an individual’s body shape and size, is an increasingly common approach in health assessment, product design, and biomechanical analysis. Non-contact, three-dimensional (3D) scanning, which can obtain individual human models, has been
widely used as a tool for automatic anthropometric measurement. Recently,
Alldieck et al. (2018) developed a video-based 3D modelling technique, enabling
the generation of individualised human models for virtual reality purposes. As
the technique is based on standard video images, hardware requirements are minimal, increasing the flexibility of the technique’s applications. The aim of this
study was to develop an automated method for acquiring anthropometric measurements from models generated using a video-based 3D modelling technique
and to determine the accuracy of the developed method. Each participant’s anthropometry was measured manually by accredited operators as the reference values. Sequential images for each participant were captured and used as input data
to generate personal 3D models, using the video-based 3D modelling technique.
Bespoke scripts were developed to obtain corresponding anthropometric data
from generated 3D models. When comparing manual measurements and those
extracted using the developed method, the accuracy of the developed method was
shown to be a potential alternative approach of anthropometry using existing
commercial solutions. However, further development, aimed at improving modelling accuracy and processing speed, is still warranted
Zero-Shot Task Transfer
In this work, we present a novel meta-learning algorithm
TTNet1
that regresses model parameters for novel tasks for
which no ground truth is available (zero-shot tasks). In
order to adapt to novel zero-shot tasks, our meta-learner
learns from the model parameters of known tasks (with
ground truth) and the correlation of known tasks to zeroshot tasks. Such intuition finds its foothold in cognitive science, where a subject (human baby) can adapt to a novel
concept (depth understanding) by correlating it with old
concepts (hand movement or self-motion), without receiving an explicit supervision. We evaluated our model on the
Taskonomy dataset, with four tasks as zero-shot: surface
normal, room layout, depth and camera pose estimation.
These tasks were chosen based on the data acquisition complexity and the complexity associated with the learning process using a deep network. Our proposed methodology outperforms state-of-the-art models (which use ground truth)
on each of our zero-shot tasks, showing promise on zeroshot task transfer. We also conducted extensive experiments
to study the various choices of our methodology, as well as
showed how the proposed method can also be used in transfer learning. To the best of our knowledge, this is the first
such effort on zero-shot learning in the task space
Dynamic texture recognition using time-causal spatio-temporal scale-space filters
This work presents an evaluation of using time-causal scale-space filters as primitives for video analysis. For this purpose, we present a new family of video descriptors based on regional statistics of spatiotemporal scale-space filter responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain. We evaluate one member in this family, constituting a joint binary histogram, on two widely used dynamic texture databases. The experimental evaluation shows competitive performance compared to previous methods for dynamic texture recognition, especially on the more complex DynTex database. These results support the descriptive power of time-causal spatio-temporal scale-space filters as primitives for video analysis.QC 20170512Scale-space theory for invariant and covariant visual receptive fieldsTime-causal receptive fields for computer vision and modelling of biological visio